Zero-Shot Event Detection by Multimodal Distributional Semantic Embedding of Videos

نویسندگان

Mohamed Elhoseiny

Jingen Liu

Hui Cheng

Harpreet S. Sawhney

Ahmed M. Elgammal

چکیده

We propose a new zero-shot Event Detection method by Multi-modal Distributional Semantic embedding of videos. Our model embeds object and action concepts as well as other available modalities from videos into a distributional semantic space. To our knowledge, this is the first Zero-Shot event detection model that is built on top of distributional semantics and extends it in the following directions: (a) semantic embedding of multimodal information in videos (with focus on the visual modalities), (b) automatically determining relevance of concepts/attributes to a free text query, which could be useful for other applications, and (c) retrieving videos by free text event query (e.g., ”changing a vehicle tire”) based on their content. We embed videos into a distributional semantic space and then measure the similarity between videos and the event query in a free text form. We validated our method on the large TRECVID MED (Multimedia Event Detection) challenge. Using only the event title as a query, our method outperformed the state-of-the-art that uses big descriptions from 12.6% to 13.5% with MAP metric and 0.73 to 0.83 with ROC-AUC metric. It is also an order of magnitude faster.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Language Guided Visual Perception

People typically learn through exposure to visual stimuli associated with linguistic descriptions. For instance, teaching visual concepts to children is often accompanied by descriptions in text or speech. This motivates the question of how this learning process could be computationally modeled. In this dissertation we explored three settings, where we showed that combining language and vision ...

متن کامل

VideoStory Embeddings Recognize Events when Examples are Scarce

This paper aims for event recognition when video examples are scarce or even completely absent. The key in such a challenging setting is a semantic video representation. Rather than building the representation from individual attribute detectors and their annotations, we propose to learn the entire representation from freely available web videos and their descriptions using an embedding between...

متن کامل

Semantic Concept Discovery for Large-Scale Zero-Shot Event Detection

Semantic Concept Discovery for Large-Scale Zero-Shot Event Detection Report Title We focus on detecting complex events in unconstrained Internet videos. While most existing works rely on the abundance of labeled training data, we consider a more difficult zero-shot setting where no training data is supplied. We first pre-train a number of concept classifiers using data from other sources. Then ...

متن کامل

Multi-Mode Semantic Cues Based on Hidden Conditional Random Field in Soccer Video

A new framework based on multimodal semantic clues and HCRF (Hidden Conditional Random Field) for soccer wonderful event detection. Through analysis of the structural semantics of the wonderful event videos, define nine kinds of multimodal semantic clues to accurately describe the included semantic information of the wonderful events. After splitting the video clips into several physical shots,...

متن کامل

Dynamic Concept Composition for Zero-Example Event Detection

In this paper, we focus on automatically detecting events in unconstrained videos without the use of any visual training exemplars. In principle, zero-shot learning makes it possible to train an event detection model based on the assumption that events (e.g. birthday party) can be described by multiple mid-level semantic concepts (e.g. “blowing candle”, “birthday cake”). Towards this goal, we f...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Zero-Shot Event Detection by Multimodal Distributional Semantic Embedding of Videos

نویسندگان

چکیده

منابع مشابه

Language Guided Visual Perception

VideoStory Embeddings Recognize Events when Examples are Scarce

Semantic Concept Discovery for Large-Scale Zero-Shot Event Detection

Multi-Mode Semantic Cues Based on Hidden Conditional Random Field in Soccer Video

Dynamic Concept Composition for Zero-Example Event Detection

عنوان ژورنال:

اشتراک گذاری